Constraint-Based Reasoning
a novel constraint optimization method to encode the generic knowledge into a BN without requiring any training data
Our proposed approach can be applied to other AUs as well. In Tab.6, LP-SM also considers apex frames on CK+, and The comparison to LP-SM is consistent. In Tab.8, we apply FMPN-FER and DeepEmotion to our pre-processed We will consider a pre-trained VGGFace model in our further work. R2 2.1 The novelty compared to prior work. Facial expression can be a group of AUs.
Ensuring Safety in an Uncertain Environment: Constrained MDPs via Stochastic Thresholds
This paper studies constrained Markov decision processes (CMDPs) with constraints against stochastic thresholds, aiming at safety of reinforcement learning in unknown and uncertain environments. We leverage a Growing-Window estimator sampling from interactions with the uncertain and dynamic environment to estimate the thresholds, based on which we design Stochastic Pessimistic-Optimistic Thresholding (SPOT), a novel model-based primal-dual algorithm for multiple constraints against stochastic thresholds. SPOT enables reinforcement learning under both pessimistic and optimistic threshold settings. We prove that our algorithm achieves sublinear regret and constraint violation; i.e., a reward regret of $\tilde{\mathcal{O}}(\sqrt{T})$ while allowing an $\tilde{\mathcal{O}}(\sqrt{T})$ constraint violation over $T$ episodes. The theoretical guarantees show that our algorithm achieves performance comparable to that of an approach relying on fixed and clear thresholds. To the best of our knowledge, SPOT is the first reinforcement learning algorithm that realises theoretical guaranteed performance in an uncertain environment where even thresholds are unknown.
Generative Modelling of Structurally Constrained Graphs
Graph diffusion models have emerged as state-of-the-art techniques in graph generation; yet, integrating domain knowledge into these models remains challenging. Domain knowledge is particularly important in real-world scenarios, where invalid generated graphs hinder deployment in practical applications. Unconstrained and conditioned graph diffusion models fail to guarantee such domain-specific structural properties. We present ConStruct, a novel framework that enables graph diffusion models to incorporate hard constraints on specific properties, such as planarity or acyclicity. Our approach ensures that the sampled graphs remain within the domain of graphs that satisfy the specified property throughout the entire trajectory in both the forward and reverse processes. This is achieved by introducing an edge-absorbing noise model and a new projector operator. ConStruct demonstrates versatility across several structural and edge-deletion invariant constraints and achieves state-of-the-art performance for both synthetic benchmarks and attributed real-world datasets. For example, by incorporating planarity constraints in digital pathology graph datasets, the proposed method outperforms existing baselines, improving data validity by up to 71.1 percentage points.
types considered in this work
Note that in the current implementation we do not distinguish sub-primitive references that point to different parts of a primitive, but rely on the predicted geometric closeness of primitive parts to tell them in the post-process, as we find the geometric predictions are generally quite accurate for this purpose. On the other hand, we note that the extension of references into primitive parts can be trivially achieved by turning primitives into functions and augmenting them with arguments (similar to how we model constraints), such that each argument corresponds to a primitive part; the constraint references can then pinpoint to primitive parts through argument passing (Sec. A.2 Implementation details Sketch encoding format In Sec. 4 we described how sketches are encoded to allow network learning; here we present more implementation details. We encode the input sketch S as a series of primitive tokens followed by a series of constraint tokens, with these tokens supplemented by learned positional encoding according to their indices in this sequence (Sec. We additionally insert learnable START, END and NEW tokens at the front of the sequence, the end of the sequence, as well as between every encoded primitive/constraint respectively, to produce the complete sequence.
A Proof of the strong duality 4
The third inequality follows from identifying that for a given λ, the best policy may be defined pointwise as the argument of the maximum written in the expectation. Thus, only the middle equality () deserves a proof. We obtain it by applying a general theorem of strong duality (which requires feasibility for slightly smaller cost constraints). We restate a result extracted from the monograph by Luenberger [1969]. It relies on the dual functional φ, whose expression we recall below.
Autoregressive Policy Optimization for Constrained Allocation Tasks
Allocation tasks represent a class of problems where a limited amount of resources must be allocated to a set of entities at each time step. Prominent examples of this task include portfolio optimization or distributing computational workloads across servers. Allocation tasks are typically bound by linear constraints describing practical requirements that have to be strictly fulfilled at all times. In portfolio optimization, for example, investors may be obligated to allocate less than 30% of the funds into a certain industrial sector in any investment period. Such constraints restrict the action space of allowed allocations in intricate ways, which makes learning a policy that avoids constraint violations difficult. In this paper, we propose a new method for constrained allocation tasks based on an autoregressive process to sequentially sample allocations for each entity. In addition, we introduce a novel de-biasing mechanism to counter the initial bias caused by sequential sampling. We demonstrate the superior performance of our approach compared to a variety of Constrained Reinforcement Learning (CRL) methods on three distinct constrained allocation tasks: portfolio optimization, computational workload distribution, and a synthetic allocation benchmark. Our code is available at: https://github.com/
Reduced Policy Optimization for Continuous Control with Hard Constraints Shutong Ding Jingya Wang 1 Ye Shi
Recent advances in constrained reinforcement learning (RL) have endowed reinforcement learning with certain safety guarantees. However, deploying existing constrained RL algorithms in continuous control tasks with general hard constraints remains challenging, particularly in those situations with non-convex hard constraints. Inspired by the generalized reduced gradient (GRG) algorithm, a classical constrained optimization technique, we propose a reduced policy optimization (RPO) algorithm that combines RL with GRG to address general hard constraints.
Self-Labeling the Job Shop Scheduling Problem
This work proposes a self-supervised training strategy designed for combinatorial problems. An obstacle in applying supervised paradigms to such problems is the need for costly target solutions often produced with exact solvers. Inspired by semi-and self-supervised learning, we show that generative models can be trained by sampling multiple solutions and using the best one according to the problem objective as a pseudo-label. In this way, we iteratively improve the model generation capability by relying only on its self-supervision, eliminating the need for optimality information. We validate this Self-Labeling Improvement Method (SLIM) on the Job Shop Scheduling (JSP), a complex combinatorial problem that is receiving much attention from the neural combinatorial community. We propose a generative model based on the well-known Pointer Network and train it with SLIM. Experiments on popular benchmarks demonstrate the potential of this approach as the resulting models outperform constructive heuristics and state-of-the-art learning proposals for the JSP. Lastly, we prove the robustness of SLIM to various parameters and its generality by applying it to the Traveling Salesman Problem.
Discrete Object Generation with Reversible Inductive Construction
Ari Seff, Wenda Zhou, Farhan Damani, Abigail Doyle, Ryan P. Adams
The success of generative modeling in continuous domains has led to a surge of interest in generating discrete data such as molecules, source code, and graphs. However, construction histories for these discrete objects are typically not unique and so generative models must reason about intractably large spaces in order to learn. Additionally, structured discrete domains are often characterized by strict constraints on what constitutes a valid object and generative models must respect these requirements in order to produce useful novel samples. Here, we present a generative model for discrete objects employing a Markov chain where transitions are restricted to a set of local operations that preserve validity. Building off of generative interpretations of denoising autoencoders, the Markov chain alternates between producing 1) a sequence of corrupted objects that are valid but not from the data distribution, and 2) a learned reconstruction distribution that attempts to fix the corruptions while also preserving validity. This approach constrains the generative model to only produce valid objects, requires the learner to only discover local modifications to the objects, and avoids marginalization over an unknown and potentially large space of construction histories. We evaluate the proposed approach on two highly structured discrete domains, molecules and Laman graphs, and find that it compares favorably to alternative methods at capturing distributional statistics for a host of semantically relevant metrics.